HiLUK: Scalable Incomplete Factorization Utilizing Combinatorial Methods to Reduce Overheads

نویسندگان

  • Joshua Dennis Booth
  • Sivasankaran Rajamanickam
چکیده

Incomplete factorizations are used to approximate the factorization of a sparse coe cient matrix A, such that A = L̄Ū ⇡ LU , and are commonly used as preconditioners for iterative methods, such as GMRES [7]. The approximation is normally achieved by some combination of dropping small value and/or by not allowing fill-in, i.e., zero elements becoming nonzero during factorization, based on levels (k) generated in the elimination tree (ILU-K). Incomplete factorizations are notorious for scaling poorly due to low computational intensity per communication/synchronization (sync). Due to this, very few implementations exist that scale beyond a handful of threads. However, increasing number of light-weight cores require that incomplete factorizations scale in order to not to be the bottleneck in key operations such as preconditioned GMRES. In this work, we present a new incomplete factorization package HiLUK that uses a variety of combinatorial techniques to achieve near linear speedups on x86 and Intel Phi. We only report ILU-0 here for brevity. Sparse factorizations have always required the use of advance combinatorial methods such as graph partitioning and ordering. In order to scale on current systems, a combination of these techniques need to be used to exploit both the matrix sparsity pattern and the underlying hierarchies in modern computer architectures [1]. Synchronization. Traditional methods parallelize incomplete factorization include factoring based on levelsets, coloring, or nested-dissection orderings (ND). However, each of these techniques’ standard implementation requires all threads to sync between levels, colors, or tree levels. These syncs can dominate the execution time as the computational intensity of incomplete factorization is much lower than full factorization. In Figure 1, we present a scatter plot of both the number of rows vs number of syncs required to factor 7 matrices reordered with ND using 16 threads of OpenMP style barriers with level-sets. We see from the plot that the

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hybrid recursive multilevel incomplete factorization preconditioner for solving general linear systems

In this paper we introduce an algebraic recursive multilevel incomplete factorization preconditioner, based on a distributed Schur complement formulation, for solving general linear systems. The novelty of the proposed method is to combine factorization techniques of both implicit and explicit type, recursive combinatorial algorithms, multilevel mechanisms and overlapping strategies to maximize...

متن کامل

A Scalable Parallel Block Algorithm for Band Cholesky Factorization

In this paper, we present an algorithm for computing the Cholesky factorization of large banded matrices on the IBM distributed memory parallel machines. The algorithm aims at optimizing the single node performance and minimizing the communication overheads. An important result of our paper is that the proposed algorithm is strongly scalable. As the bandwidth of the matrix increases, the number...

متن کامل

Crout Versions of the Ilu Factorization with Pivoting for Sparse Symmetric Matrices

The Crout variant of ILU preconditioner (ILUC) developed recently has been shown to have a number of advantages over ILUT, the conventional row-based ILU preconditioner [14]. This paper explores pivoting strategies for sparse symmetric matrices to improve the robustness of ILUC. This paper shows how to integrate two symmetry-preserving pivoting strategies, the diagonal pivoting and the Bunch-Ka...

متن کامل

On a Two-Level Parallel MIC(0) Preconditioning of Crouzeix-Raviart Non-conforming FEM Systems

In this paper we analyze a two-level preconditioner for finite element systems arising in approximations of second order elliptic boundary value problems by Crouzeix-Raviart non-conforming triangular linear elements. This study is focused on the efficient implementation of the modified incomplete LU factorization MIC(0) as a preconditioner in the PCG iterative method for the linear algebraic sy...

متن کامل

Effective Preconditioning through Ordering Interleaved with Incomplete Factorization

Consider the solution of a sparse linear system Ax = b when the matrix A is symmetric and positive definite. A typical iterative solver is obtained by using the method of Conjugate Gradients (CG) [15] preconditioned with an incomplete Cholesky (IC) factor L̂ [4]. The latter is an approximation to the (complete) Cholesky factor L, where A = LL . Consequently, the process of computing L̂ relies to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016